223 research outputs found

    The Data Science Design Manual

    Get PDF

    Numerical Investigation of Graph Spectra and Information Interpretability of Eigenvalues

    Full text link
    We undertake an extensive numerical investigation of the graph spectra of thousands regular graphs, a set of random Erd\"os-R\'enyi graphs, the two most popular types of complex networks and an evolving genetic network by using novel conceptual and experimental tools. Our objective in so doing is to contribute to an understanding of the meaning of the Eigenvalues of a graph relative to its topological and information-theoretic properties. We introduce a technique for identifying the most informative Eigenvalues of evolving networks by comparing graph spectra behavior to their algorithmic complexity. We suggest that extending techniques can be used to further investigate the behavior of evolving biological networks. In the extended version of this paper we apply these techniques to seven tissue specific regulatory networks as static example and network of a na\"ive pluripotent immune cell in the process of differentiating towards a Th17 cell as evolving example, finding the most and least informative Eigenvalues at every stage.Comment: Forthcoming in 3rd International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO), Lecture Notes in Bioinformatics, 201

    When Can You Fold a Map?

    Get PDF
    We explore the following problem: given a collection of creases on a piece of paper, each assigned a folding direction of mountain or valley, is there a flat folding by a sequence of simple folds? There are several models of simple folds; the simplest one-layer simple fold rotates a portion of paper about a crease in the paper by +-180 degrees. We first consider the analogous questions in one dimension lower -- bending a segment into a flat object -- which lead to interesting problems on strings. We develop efficient algorithms for the recognition of simply foldable 1D crease patterns, and reconstruction of a sequence of simple folds. Indeed, we prove that a 1D crease pattern is flat-foldable by any means precisely if it is by a sequence of one-layer simple folds. Next we explore simple foldability in two dimensions, and find a surprising contrast: ``map'' folding and variants are polynomial, but slight generalizations are NP-complete. Specifically, we develop a linear-time algorithm for deciding foldability of an orthogonal crease pattern on a rectangular piece of paper, and prove that it is (weakly) NP-complete to decide foldability of (1) an orthogonal crease pattern on a orthogonal piece of paper, (2) a crease pattern of axis-parallel and diagonal (45-degree) creases on a square piece of paper, and (3) crease patterns without a mountain/valley assignment.Comment: 24 pages, 19 figures. Version 3 includes several improvements thanks to referees, including formal definitions of simple folds, more figures, table summarizing results, new open problems, and additional reference

    Analysis of airplane boarding via space-time geometry and random matrix theory

    Full text link
    We show that airplane boarding can be asymptotically modeled by 2-dimensional Lorentzian geometry. Boarding time is given by the maximal proper time among curves in the model. Discrepancies between the model and simulation results are closely related to random matrix theory. We then show how such models can be used to explain why some commonly practiced airline boarding policies are ineffective and even detrimental.Comment: 4 page

    Generating Abstractive Summaries from Meeting Transcripts

    Full text link
    Summaries of meetings are very important as they convey the essential content of discussions in a concise form. Generally, it is time consuming to read and understand the whole documents. Therefore, summaries play an important role as the readers are interested in only the important context of discussions. In this work, we address the task of meeting document summarization. Automatic summarization systems on meeting conversations developed so far have been primarily extractive, resulting in unacceptable summaries that are hard to read. The extracted utterances contain disfluencies that affect the quality of the extractive summaries. To make summaries much more readable, we propose an approach to generating abstractive summaries by fusing important content from several utterances. We first separate meeting transcripts into various topic segments, and then identify the important utterances in each segment using a supervised learning approach. The important utterances are then combined together to generate a one-sentence summary. In the text generation step, the dependency parses of the utterances in each segment are combined together to create a directed graph. The most informative and well-formed sub-graph obtained by integer linear programming (ILP) is selected to generate a one-sentence summary for each topic segment. The ILP formulation reduces disfluencies by leveraging grammatical relations that are more prominent in non-conversational style of text, and therefore generates summaries that is comparable to human-written abstractive summaries. Experimental results show that our method can generate more informative summaries than the baselines. In addition, readability assessments by human judges as well as log-likelihood estimates obtained from the dependency parser show that our generated summaries are significantly readable and well-formed.Comment: 10 pages, Proceedings of the 2015 ACM Symposium on Document Engineering, DocEng' 201

    The Lazy Bureaucrat Scheduling Problem

    Full text link
    We introduce a new class of scheduling problems in which the optimization is performed by the worker (single ``machine'') who performs the tasks. A typical worker's objective is to minimize the amount of work he does (he is ``lazy''), or more generally, to schedule as inefficiently (in some sense) as possible. The worker is subject to the constraint that he must be busy when there is work that he can do; we make this notion precise both in the preemptive and nonpreemptive settings. The resulting class of ``perverse'' scheduling problems, which we denote ``Lazy Bureaucrat Problems,'' gives rise to a rich set of new questions that explore the distinction between maximization and minimization in computing optimal schedules.Comment: 19 pages, 2 figures, Latex. To appear, Information and Computatio

    Optimal Paths in Complex Networks with Correlated Weights: The World-wide Airport Network

    Get PDF
    We study complex networks with weights, wijw_{ij}, associated with each link connecting node ii and jj. The weights are chosen to be correlated with the network topology in the form found in two real world examples, (a) the world-wide airport network, and (b) the {\it E. Coli} metabolic network. Here wij∼xij(kikj)αw_{ij} \sim x_{ij} (k_i k_j)^\alpha, where kik_i and kjk_j are the degrees of nodes ii and jj, xijx_{ij} is a random number and α\alpha represents the strength of the correlations. The case α>0\alpha > 0 represents correlation between weights and degree, while α<0\alpha < 0 represents anti-correlation and the case α=0\alpha = 0 reduces to the case of no correlations. We study the scaling of the lengths of the optimal paths, ℓopt\ell_{\rm opt}, with the system size NN in strong disorder for scale-free networks for different α\alpha. We calculate the robustness of correlated scale-free networks with different α\alpha, and find the networks with α<0\alpha < 0 to be the most robust networks when compared to the other values of α\alpha. We propose an analytical method to study percolation phenomena on networks with this kind of correlation. We compare our simulation results with the real world-wide airport network, and we find good agreement

    Transport in weighted networks: Partition into superhighways and roads

    Full text link
    Transport in weighted networks is dominated by the minimum spanning tree (MST), the tree connecting all nodes with the minimum total weight. We find that the MST can be partitioned into two distinct components, having significantly different transport properties, characterized by centrality -- number of times a node (or link) is used by transport paths. One component, the {\it superhighways}, is the infinite incipient percolation cluster; for which we find that nodes (or links) with high centrality dominate. For the other component, {\it roads}, which includes the remaining nodes, low centrality nodes dominate. We find also that the distribution of the centrality for the infinite incipient percolation cluster satisfies a power law, with an exponent smaller than that for the entire MST. The significance of this finding is that one can improve significantly the global transport by improving a tiny fraction of the network, the superhighways.Comment: 12 pages, 5 figure
    • …
    corecore